4 



(19) 



J 



EuropMlsches Patentamt 
European Patent Office 
Office europeen des brevets 




(11) 



BP 1 209 662 A2 



(12) 



EUROPEAN PATENT APPLICATION 





natA ftf ni ihli*^atinn* 
iJaixs \ji puuii^cillUM. 


/C4\ t^*.r^t7 OHftl -IC/OC 

(51) Intel/: lalUL 15/26 




29.05.2002 Bulletin 2002/22 


(21) 


Application number 01309945.2 




(22) 


Date of filing: 27.11.2001 




(84) 


Designated Contracting States: 


(72) Inventors: 




AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


• Kushida, Akihiro, c/o Canon KabushikI Kalsha 




IWC NL PT SE TR 


Ohta-ku, Tokyo (JP) 




Designated Extension States: 


• Kosaka, Tetsuo, c/o Canon KabushikI Kalsha 




AL LT LV MK RO SI 


Ohta-ku, Tokyo (JP) 


(30) 


Priority; 27.11.2000 JP 2000360203 


(74) Representative: 






Beresford, Keith Denis Lewis et al 


(71) 


Applicant: CANON KABUSHIKI KAISHA 


BERESFORD&Co. 




Tokyo (JP) 


2-5 Warwick Court, 






High Holborn 






London WC1R5DH (GB) 



(54) Ciient-sei^er based speech recognition 



< 

CM 

to 

O 

o 

CM 



(57) A user dictionary, which is fonned by storing 
pronunciations and notations of target recognition 
words designated by the user in correspondence with 
each other, input speech recognition data, and diction- 
ary management data used to determine the recognition 
field of a recognition dictionary used in recognition of 
the speech recognition data are sent to a server via a 
comrhunication module. In the server, a dictionary man- 
agement unit looks up an identifier table to detemnine a 
recognition dictionary corresponding to the dictionary 
management information received from a client from a 
plurality of kinds of recognition dictionaries. A speech 
recognition module recognizes the speech recognition 
data using at least the determined recognition diction- 
ary. The recognition result is sent to the client via a com- 
munication module. 



FIG. 6 



S101 



SEND USER DICnONAflY 























8102 




SEND DICTIONARY 






MANAGEMEhrr 
INFORMATION 




^ 






8103 




SEND SPEECH 




INFORMATION 




^ 






3104 




RECEIVE 






RECOGNITION RESULT 








S105 








NO 





RECEIVE USER 
DICTIONARY 










8202 


RECEIVE DICTIONARY ^ 
MANAGEMENT INFORMATION 






DICTIONARY 
MANAGEMENT 








S?04 


RECQVE SPEECH 
INFORMATION 








8205 


SPEECH RECOGNmON 






3206 


SEND RECOGNITION 
RESULT 


1 8207 
<" ENDr~'Z>- 


NO 



YES 



Q. 

LLJ 



7/5/2005, EAST Version: 2.0.1.4 



1 



EP 1 209 662 A2 



Description 

FIELD OF THE INVENTION 

[0001 ] The present invention relates to a client-server 
speech recognition system for recognizing speech input 
at a client by a server, a speech recognition server, a 
speech recognition client, their control method, and a 
computer readable memory. 

BACKGROUND OF THE INVENTION 

[0002] In recent years, speech is used as an Input in- 
terface in addition to a keyboard, mouse, and the like. 
[0003] However, the recognition rate of speech rec- 
ognition that recognizes input speech lowers and re- 
quires a longer processing time as the number of rec- 
ognition words which are to undergo speech recognition 
becomes larger. For this reason, in an actual method, a 
plurality of recognition dictionaries or lexicons that reg- 
ister recognition words (e.g., pronunciations and nota- 
tions) which are to undergo speech recognition are pre- 
pared, and are selectively used (a plurality of recognition 
dictionaries may be used at the same time). 
[0004] Also, unregistered words cannot be recog- 
nized. As one of methods for solving this problem, a user 
dictionary or lexicon (prepared by the user to register 
recognition words which are to undergo speech recog- 
nition) may be used. 

[0005] On the other hand, a client-server speech rec- 
ognition system has been studied to implement speech 
recognition on a tenninal with insufficient resources. 
[0006] These three techniques are known to those 
who are skilled in the art, but a system that combines 
these three techniques has not been realized yet. 

SUMMARY OF THE INVENTION 

[0007] The present invention has been made to solve 
the above problems, and has as its object to provide a 
speech recognition system which uses a user dictionary 
in response to a user's request in a client-server speech 
recognition system so as to improve speech input effi- 
ciency and to reduce the processing load on the entire 
system, a speech recognition server, a speech recogni- 
tion client, their control method, and a computer reada- 
ble memory. 

[0008] According to the present invention, the forego- 
ing object is attained by providing, a client-server 
speech recognition system for recognizing speech input 
at a client by a server, 

the client comprising: 

speech input means for inputting speech; 
user dictionary holding means for holding a us- 
er dictionary fomned by registering target rec- 
ognition words designated by a user; and 



transmission means for transmitting speech 
data input by said speech input means, diction- 
ary management infonmation used to deter- 
mine a recognition field of a recognition diction- 
5 ary used to recognize the speech data, and the 

user dictionary to the server, and 

the server comprising: 

10 recognition dictionary holding means for hold- 

ing a plurality of kinds of recognition dictionar- 
ies prepared for respective recognition fields; 
detenni nation means for determining one or 
more recognition dictionary corresponding to 

^5 the dictionary management infonnation re- 

ceived from the client from the plurality of kinds 
of recognition dictionaries; and 
recognition means for recognizing the speech 
data using at least the recognition dictionary 

20 detennined by said detennination means. 

[0009] Other features and advantages of the present 
Invention will be apparent from the following description 
taken in conjunction with the accompanying drawings, 
25 in which like reference characters designate the same 
or similar parts throughout the figures thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

30 [0010] 

Fig. 1 is a block diagram showing the hardware ar- 
rangement of a speech recognition system of the 
first embodiment; 
35 Fig. 2 is a block diagram showing the functional ar- 
rangement of the speech recognition system of the 
first embodiment; 

Fig. 3 shows the configuration of a user dictionary 
of the first embodiment; 
40 Fig. 4 shows a speech Input window of the first em- 
bodiment; 

Fig. 5 shows an identifier table of the first embodi- 
ment; 

Fig. 6 is a flow chart showing the process executed 
^5 by the speech recognition system of the first em- 
bodiment; 

Fig. 7 shows the configuration of a user dictionary 
appended with Input form identifiers according to 
the third embodiment; and 
50 Fig. 8 shows the configuration of a user dictionary 
appended with recognition dictionary identifiers ac- 
cording to the third embodiment. 

DESCRIPTION OF THE PREFERRED 
55 EMBODIMENTS 

[0011] Preferred embodiments of the present inven- 
tion will be described In detail below with reference to 
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the accompanying drawings. 
[First Embodiment] 

[0012] Fig. 1 shows the hardware arrangement of a 
speech recognition system of the first embodiment. 
[0013] A CPU 101 systematically controls an entire 
client 100. The CPU 101 loads programs stored in a 
ROM 102 onto a RAM 103, and executes various proc- 
esses on the basis of the loaded programs. The ROM 
1 02 stores various programs of processes to be execut- 
ed by the CPU 101. The RAM 103 provides a storage 
area required to execute various programs stored in the 
ROM 102. 

[001 4] A secondary storage device 1 04 stores an OS 
and various programs. When the client 100 is imple- 
mented using not a general-purpose apparatus such as 
a personal computer or the like but a dedicated appara- 
tus, the ROM 102 may store the OS and various pro- 
grams. By loading the stored programs onto the RAM 
1 03, the CPU 1 01 can execute processes. As the sec- 
ondary storage device 104, a hard disk device, floppy 
disk drive, CD-ROM. or the like may be used. That is, 
storage media are not particularly limited. 
[0015] A network l/F (interface) 1 05 is connected to a 
network l/F 205 of a server 200. 
[0016] An input device 106 comprises a mouse, key- 
board, microphone, and the like to allow input of various 
instructions to processes to be executed by the CPU 
101, and can be used by simultaneously connecting 
these plurality of devices. An output device 107 com- 
prises a display (CRT, LCD, or the like), and displays 
information input by the input device 106, and display 
windows which are controlled by various processes ex- 
ecuted by the CPU 1 01 . A bus 1 08 interconnects various 
building components of the client 100. 
[0017] A CPU 201 systematically controls the entire 
server 200. The CPU 201 loads programs stored in a 
ROM 202 onto a RAM 203, and executes various proc- 
esses on the basis of the loaded programs. The ROM 
202 stores various programs of processes to be execut- 
ed by the CPU 201 . The RAM 203 provides a storage 
area required to execute various programs stored in the 
ROM 202. 

[001 8] A secondary storage device 204 stores an OS 
and various programs. When the server 200 is imple- 
mented using not a versatile apparatus such as a per- 
sonal computer or the like but a dedicated apparatus, 
the ROM 202 may store the OS and various programs. 
By loading the stored programs onto the RAM 203, the 
CPU 201 can execute processes. As the secondary 
storage device 204, a hard disk device, floppy disk drive, 
CD-ROM, or the like may be used. That is, storage me- 
dia are not particularly limited. 
[0019] The network l/F 205 is connected to the net- 
work l/F 105 of the client 100. A bus 206 interconnects 
various building components of the server 200. 
[0020] The functional arrangement of the speech rec- 



ognition system of the first embodiment will be de- 
scribed below using Fig. 2. 

[0021] Fig. 2 is a block diagram showing the functional 
arrangement of the speech recognition system of the 

5 first embodiment. 

[0022] In the client 100, a speech input module 121 
inputs speech uttered by the user via a microphone (in- 
put device 106), and A/D-converts input speech data 
(speech recognition data) which is to undergo speech 

10 recognition. A communication module 122 sends a user 
dictionary 124a, speech recognition data 124b, diction- 
ary management infomnation 124c, and the like to the 
server 200. Also, the communication module 122 re- 
ceives a speech recognition result of the sent speech 

15 recognition data 1 24b and the like from the server 200. 
[0023] Adisplay module 123 displays the speech rec- 
ognition result received from the server 200 while stor- 
ing it in, e.g., an input form which is displayed on the 
output device 107 by the process executed by the 

20 speech recognition system of this embodiment. 

[0024] In the server 200, a communication module 
221 receives the user dictionary 1 24a, speech recogni- 
tion data 124b, dictionary management infonnatlon 
124c, and the like from the client 1 00. Also, the commu- 

25 nication module 221 sends the speech recognition re- 
sult of the speech recognition data 124b and the like to 
the client 100. 

[0025] A dictionary management module 223 switch- 
es and selects a plurality of kinds of recognition diction- 
30 aries 225 (recognition dictionary 1 to recognition diction- 
ary N, N: a positive integer) prepared for respective rec- 
ognition fields (e.g., for names, addresses, alphanumer- 
ic symbols, and the like), and the user dictionary 124a 
received from the client 1 00 (may simultaneously use a 
35 plurality of kinds of dictionaries). 

[0026] Note that the plurality of kinds of recognition 
dictionaries 225 are prepared for each dictionary man- 
agement infomnation 124c (input form identifier; to be 
described later) sent from the client 1 00. Each recogni- 
se tion dictionary 225 is appended with a recognition dic- 
tionary identifier indicating the recognition field of that 
recognition dictionary. The dictionary management 
module 223 manages an identif iertable 223a that stores 
the recognition dictionary identifiers and input fonm iden- 
^5 tifiers in correspondence with each other, as shown in 
Fig. 5. 

[0027] A speech recognition module 224 executes 
speech recognition using the recognition dictionary or 
dictionaries 225 and user dictionary 1 24a designated for 
50 speech recognition by the dictionary management mod- 
ule 223 on the basis of the speech recognition data 1 24b 
and dictionary management information 124c received 
from the client 100. 

[0028] Note that the user dictionary 1 24a is prepared 
55 by the user to register recognition words which are to 
undergo speech recognition, and stores pronunciations 
and notations of words to be recognized in correspond- 
ence with each other, as shown in, e.g.. Fig. 3. 
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[0029] The speech recognition data 124b may be ei- 
ther speech data A/D-converted by the speech input 
module 121 or data obtained by encoding that speech 
data. 

[0030] The dictionary management infomnation 124c 
indicates an input object and the lilce. For example, the 
dictionary management information 124c is an identifier 
(input form Identifier) indicating the type of input fomn 
when the server 200 recognizes input speech and inputs 
text data corresponding to that speech recognition result 
to each input fonn, which defines a speech input window 
displayed by the speech recognition system of the first 
embodiment, as shown in Fig. 4. The client 100 sends 
this input fonn identifier to the server 200 as the diction- 
ary management infomriation 124c. In the server 200, 
the dictionary management module 223 looks up the 
identifier table 223a to acquire a recognition dictionary 
identifier corresponding to the received input fonn iden- 
tifier, and determines a recognition dictionary 225 to be 
used in speech recognition. 

[0031] The process executed by the speech recogni- 
tion system of the first embodiment will be explained be- 
low using Fig. 6. 

[0032] Fig. 6 is a flow chart showing the process ex- 
ecuted by the speech recognition system of the first em- 
bodiment. 

[0033] In step S1 01 , the client 1 00 sends the user dic- 
tionary 124a to the server 200. 
[0034] I n step S201 , the server 200 receives the user 
dictionary 1 24a from the client 1 00. 
[0035] In step SI 02, when speech is input to an input 
form as a target speech input, the client 100 sends the 
input form identifier of that input fomn to the server 200 
as the dictionary management infonmation 1 24c. 
[0036] I n step S202, the server 200 receives the Input 
form identifier from the client 1 00 as the dictionary man- 
agement information 124c. 

[0037] In step S203, the server 200 looks up the iden- 
tifier table 223a using the dictionary management infor- 
mation 1 24c to acquire a recognition dictionary identifier 
corresponding to the received input fomn identifier, and 
determines a recognition dictionary 225 to be used In 
speech recognition. 

[0038] in step SI 03, the client 1 00 sends speech rec- 
ognition data 124b, which is speech-input as text data 
to be input to each input form, to the server 200. 
[0039] In step 8204, the server 200 receives the 
speech recognition data corresponding to each input 
form from the client 100. 

[0040] In step S205, the server 200 executes speech 
recognition of the speech recognition data 124b in the 
speech recognition module 224 using the recognition 
dictionary 225 and user dictionary 124a designated for 
speech recognition by the dictionary management mod- 
ule 223. 

[0041] In the first embodiment, all recognition words 
contained in the user dictionary 1 24a sent from the client 
1 00 to the server 200 are used in speech recognition by 



the speech recognition module 224. 
[0042] In step S206, the server 200 sends the speech 
recognition result obtained by the speech recognition 
module 224 to the client 1 00. 

5 [0043] In step 8104, the client 100 receives the 
speech recognition result con^esponding to each input 
fomn from the server 200, and stores text data corre- 
sponding to the speech recognition result in the corre- 
sponding input fomn. 

10 [0044] The client 100 checks in step 8105 if the 
processing is to be ended. If the processing is not to be 
ended (NO in step SI 05), the flow returns to step SI 02 
to repeat the processing. On the other hand, if the 
processing is to be ended (YES in step 81 05), the client 

^5 1 00 infomns the server 200 of end of the processing, and 
ends the processing. 

[0045] It is checked in step 8207 if a processing end 
instruction from the client 1 00 is detected. If no process- 
ing end instnjction is detected (NO in step 8207), the 

20 flow returns to step S202 to repeat the above processes. 
On the other hand, if the processing end instruction is 
detected (YES in step S207), the processing ends. 
[0046] In the above processing, when speech is input 
to an input fomn as a target speech input, the dictionary 

25 management infomnation 1 24c corresponding to that in- 
put form is sent from the client 100 to the server 200. 
Alternatively, the dictionary management infomnation 
124c may be sent when the input form as a target 
speech input is focused by an instruction from the input 

30 device 106 (the input form as a target speech input is 
detemnined). 

[0047] In the server 200, speech recognition is made 
after all speech recognition data 124b are received. Al- 
ternatively, every time speech is input as text data to a 

35 given input fomn, that the portion of speech recognition 
data 1 24b may be sent to the server 200 frame by frame 
(for example, one frame is 10 msec speech data), and 
speech recognition may be made in real time. 
[0048] As described above, according to the first em- 

^0 bodiment, in the client-server speech recognition sys- 
tem, since the server 200 executes speech recognition 
of speech recognition data 124b using both an appro- 
priate recognition dictionary 225 and the user dictionary 
1 24a, the speech recognition precision in the server 200 

45 can be improved while reducing the processing load and 
use of storage resources associated with speech rec- 
ognition in the client 100. 

[Second Embodiment] 

50 

[0049] In the first embodiment, if no recognition words 
to be stored In the user dictionary 124a are generated, 
since the user dictionary 124a need not be used, the 
server 200 may use all recognition words in the user dic- 
55 tionary 124a in recognition only when a use request of 
the user dictionary 1 24a is received from the client 1 00. 
[0050] In this case, a flag indicating if the user diction- 
ary 1 24a is used is added as the dictionary management 
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information 124c, thus informing the server 200 of the 
presence/absence of use of the user dictionary 124a. 

[Third Embodiment] 

[0051] Since some target recognition words in the us- 
er dictionary 124a are not used depending on an input 
object, situation, and the like, only specific recognition 
words in the user dictionary 124a may be used in rec- 
ognition depending on the input object and situation. 
[0052] In such case, when the user dictionary Is man- 
aged by designating input fonn identifiers for respective 
recognition words, as shown in Fig. 7, only recognition 
words having an input fonn identifier of the input form 
used in speech input can be used In recognition. Alter- 
natively, a plurality of input fomri Identifiers may be des- 
ignated for a given recognition word. In addition, the us- 
er dictionary may be managed by designating recogni- 
tion dictionary identifiers in place of input fomri identifi- 
ers, as shown in Fig. 8. 



computer, after the program code read out from the stor- 
age medium is written in a memory of the extension 
board or unit. When the present invention is applied to 
the storage medium, that storage medium stores a pro- 
5 gram code corresponding to the flow chart shown in Fig. 
3. 

[0056] As many apparently widely different embodi- 
ments of the present invention can be made without de- 
parting from the spirit and scope thereof, it is to be un- 
10 derstood that the invention is not limited to the specific 
embodiments thereof except as defined in the append- 
ed claims. 



1 . A client-server speech recognition system for rec- 
ognizing speech input at a client by a server, 



20 the client comprising: 



15 Claims 



speech input means for inputting speech; 
user dictionary holding means for holding 
a user dictionary fomied by registering tar- 
25 get recognition words designated by a us- 

er; and 

transmission means for transmitting 
speech data input by said speech input 
means, dictionary management infonma- 
30 tion used to detennine a recognition field 

of a recognition dictionary used to recog- 
nize the speech data, and the user diction- 
ary to the server, and 

35 the server comprising: 



[Fourth Embodiment] 

[0053] By combining the second and third embodi- 
ments, the efficiency of the speech recognition process 
of the speech recognition module 224 can be further im- 
proved. 

[Fifth Embodiment] 

[0054] Most of the processes of the apparatus of the 
present invention can be implemented by programs. As 
described above, since the apparatus can use a gener- 
al-purpose apparatus such as a personal computer, the 
present invention is also achieved by supplying a stor- 
age medium, which records a program code of a soft- 
ware program that can implement the functions of the 
above-mentioned embodiments to a system or appara- 
tus, and reading out and executing the program code 
stored in the storage medium by a computer of the sys- 
tem or apparatus. In this case, the program code itself 
read out from the storage medium implements the func- 
tions of the above-mentioned embodiments, and the 
storage medium which stores the program code consti- 
tutes the present invention. As the storage medium for 
supplying the program code, for example, a floppy disk, 
hard disk, optical disk, magneto-optical disk, CD-ROM, 
magnetic tape, nonvolatile memory card, ROM, and the 
like may be used. 

[0055] The present invention can also be achieved by 
supplying the storage medium that records the program 
code to a computer, and executing some or all of actual 
processes executed by an OS running on the computer. 
Furthermore, the functions of the above-mentioned em- 
bodiments may be implemented by some or all of actual 
processing operations executed by a CPU or the like 
arranged in a function extension board or a function ex- 
tension unit, which is inserted In or connected to the 



recognition dictionary holding means for 
holding a plurality of kinds of recognition 
dictionaries prepared for respective recog- 

40 nition fields; 

detemaination means for detemnining one 
or more recognition dictionary correspond- 
ing to the dictionary management informa- 
tion received from the client from the plu- 

45 rality of kinds of recognition dictionaries; 

and 

recognition means for recognizing the 
speech data using at least the recognition 
dictionary detemriined by said determina- 
50 tion means. 

. 2. The system according to claim 1 , wherein said rec- 
ognition means recognizes the speech data using 
the recognition dictionary determined by said deter- 
55 mination means, and the user dictionary received 
frorn the client. 

3. The system according to claim 1 or 2, wherein said 



55 
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speech input means connprises display means for 
displaying an input fomri as a target speech input, 
and 

the dictionary management information is an 
Input fomn identifier that indicates a type of Input 
fonm. 

4. The system according to any of claims 1 to 3, 
wherein the dictionary management Infonnation 
contains information indicating if the user dictionary 
is used in recognition of the speech data. 

5. The system according to any preceding claim, 
wherein the user dictionary is formed by storing pro- 
nunciations and notations of the target recognition 
words in correspondence with each other. 

6. The system according to claim 3, wherein the user 
dictionary is fomned by also storing at least one in- 
put form identifier and the target recognition words 
in correspondence with each other. 

7. The system according to any preceding claim, 
wherein the user dictionary is fomned by also storing 
at least one of recognition dictionary Identifiers in- 
dicating recognition fields of the plurality of kinds of 
recognition dictionaries, and the target recognition 
words. 

8. The system according to any preceding claim, 
wherein the speech data is data obtained by encod- 
ing that speech data. 

9. A method of controlling a client-server speech rec- 
ognition system for recognizing speech Input at a 
client by a server, comprising: 

a speech input step of Inputting speech; 
a user dictionary holding step of holding, in the 
client, a user dictionary fomned by registering 
target recognition words designated by a user; 
and 

a transmission step of transmitting speech data 
input In the speech input step, dictionary man- 
agement Information used to detemnine a rec- 
ognition field of a recognition dictionary used to 
recognize the speech data, and the user dic- 
tionary to the server; 

a recognition dictionary holding step of holding, 
in the server, a plurality of kinds of recognition 
dictionaries prepared for respective recognition 
fields; 

a detennlnation step of determining one or 
more recognition dictionary corresponding to 
the dictionary management Infonmation re- 
ceived from the client from the plurality of kinds 
of recognition dictionaries; and 
a recognition step of recognizing the speech 



data using at least the recognition dictionary 
determined in the detennination step. 

10. The method according to claim 9, wherein the rec- 
5 ognition step Includes a step of recognizing the 
speech data using the recognition dictionary deter- 
mined In the detennination step, and the user dic- 
tionary received from the client. 

10 11. The method according to claim 9 or 1 0, wherein the 
speech Input step comprises a display step of dis- 
playing an input fomn as a target speech input, and 
the dictionary management infonnation is an 
Input fomn identifier that indicates a type of input 

IS form. 

12. The method according to any of claims 9 to 11, 
wherein the dictionary management Infonmation 
contains Infonmation Indicating if the user dictionary 

20 is used in recognition of the speech data. 

13. The method according to any of claims 9 to 12, 
wherein the user dictionary Is fomned by storing pro- 
nunciations and notations of the target recognition 

25 . words in con-espondence with each other. 

1 4. The method according to claim 1 1 , wherein the user 
dictionary Is formed by also storing at least one In- 
put fomn identifier and the target recognition words 

30 in correspondence with each other. 

15. The method according to any of claims 9 to 14, 
wherein the user dictionary is fomned by also storing 
at least one of recognition dictionary Identifiers In- 

35 dicating recognition fields of the plurality of kinds of 
recognition dictionaries, and the target recognition 
words. 

16. The method according to any of claims 9 to 15, 
40 wherein the speech data is data obtained by encod- 
ing that speech data. 

1 7. A computer readable memory that stores a program 
code of control of a client-server speech recognition 

45 system for recognizing speech input at a client by a 
server, comprising: 

a program code of a speech Input step of input- 
ting speech; 

50 a program code of a user dictionary holding 

step of holding. In the client, a user dictionary 
fomned by registering target recognition words 
designated by a user; and 
a program code of a transmission step of trans- 

55 mitting speech data input in the speech input 

step, dictionary management information used 
to detemnine a recognition field of a recognition 
dictionary used to recognize the speech data, 



7/5/2005. EAST Version: 2.0.1.4 



11 



EP 1 209 662 A2 



12 



and the user dictionary to the server; 
a program codeof a recognition dictionary hold- 
ing step of holding, in the server, a plurality of 
kinds of recognition dictionaries prepared for 
respective recognition fields; 
a program code of a determination step of de- 
temiining one or more recognition dictionary 
corresponding to the dictionary management 
information received from the client from the 
plurality of kinds of recognition dictionaries; and 
a program code of a recognition step of recog- 
nizing the speech data using at least the rec- 
ognition dictionary determined in the determi- 
nation step. 

18. Aspeech recognition serverfor recognizing speech 
input at a client, and sending a recognition result to 
the client, comprising: 

reception means for receiving, from the client, 
speech data, dictionary management infomria- 
tion used to detemriine^a recognition field of a 
recognition dictionary^used to recognize the 
speech data, and a user dictionary formed by 
registering target recognition words designated 
by a user; 

recognition dictionary holding means for hold- 
ing a plurality of kinds of recognition dictionar- 
ies prepared for respective recognition fields; 
determination means for determining one or 
more recognition dictionary corresponding to 
the dictionary management information re- 
ceived from the client from the plurality of kinds 
of recognition dictionaries; and 
recognition means for recognizing the speech 
data using at least the recognition dictionary 
determined by said determination means. 

19. The server according to claim 18, wherein said rec- 
ognition means recognizes the speech data using 
the recognition dictionary detemrilned by said deter- 
mination means, and the user dictionary received 
from the client. 

20. The server according to claim 1 8 or 1 9, wherein the 
speech data is data obtained by encoding that 
speech data. 

21. A speech recognition client for sending input 
speech to be recognized to a server, and receiving 
a recognition result of that speech, comprising: 

speech input means for inputting speech; 
user dictionary holding means for holding a us- 
er dictionary formed by registering target rec- 
ognition words designated by a user; and 
transmission means for transmitting speech 
data input by said speech Input means, diction- 



ary management infomnation used to deter- 
mine a recognition field of a recognition diction- 
ary used to recognize the speech data, and the 
user dictionary to the server. 

5 

22. The client according to claim 21, wherein said 
speech input means comprises display means for 
displaying an input fomn as a target speech input, 
and 

10 the dictionary management infomnation is an 

input fomri identifier that indicates a type of input 
form. 

23. The client according to claim 21 or 22, wherein the 
^5 dictionary management infomiation contains infor- 
mation indicating if the userdictionary is used in rec- 
ognition of the speech data. 

24. The client according to any of claims 21 to 23, 
20 wherein the user dictionary is formed by storing pro- 
nunciations and notations of the target recognition 
words in correspondence with each other. 

25. The client according to claim 22, wherein the user 
25 dictionary is formed by also storing at least one in- 
put fomn identifier and the target recognition words 
in correspondence with each other. 

''26. The client according to any of claims 21 to 25, 
30 wherein the user dictionary is formed by also storing 
at least one of recognition dictionary identifiers in- 
dicating recognition fields of the plurality of kinds of 
recognition dictionaries, and the target recognition 
words. 

35 

27. The client according to any of claims 21 to 25, 
wherein the speech data is data obtained by encod- 
ing that speech data. . 

40 28. A method of controlling a speech recognition server 
for recognizing speech input at aclient, and sending 
a recognition result to the client, comprising: 

a reception step of receiving, from the client, 
45 speech data, dictionary management infomna- 

tion used to detemnine a recognition field of a 
recognition dictionary used to recognize the 
speech data, and a user dictionary formed by 
registering target recognition words designated 
50 by a user; 

a recognition dictionary holding step of holding 
a plurality of kinds of recognition dictionaries 
prepared for respective recognition fields; 
a determination step of determining one or 
55 more recognition dictionary corresponding to 

the dictionary management information re- 
ceived from the client from the plurality of kinds 
of recognition dictionaries; and 
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a recognition step of recognizing the speech 
data using at least the recognition dictionary 
detennined in the determination step. 

29. The method according to claim 28, wherein the rec- 
ognition step includes a step of recognizing the 
speech data using the recognition dictionary deter- 
mined in the deteninination step, and the user dic- 
tionary received from the client. 

30. The method according to claim 28 or 29, wherein 
the speech data is data obtained by encoding that 
speech data. 

31 . A method of controlling a speech recognition client 
for sending input speech to be recognized to a serv- 
er, and receiving a recognition result of that speech, 
comprising: 

a speech input step of inputting speech; 
a user dictionary holding step of holding a user 
dictionary fonned by registering target recogni- 
tion words designated by a user; and 
a transmission step of transmitting speech data 
input in the speech input step, dictionary man- 
agement infonmation used to detemnine a rec- 
ognition field of a recognition dictionary used to 
recognize the speech data, and the user dic- 
tionary to the server, 

32. The method according to claim 31, wherein the 
speech . input step comprises a display step of dis- 
playing an input fomn as a target speech input, and 

the dictionary management information is an 
input form identifier that indicates a type of input 
fomn. 

33. The method according to claim 31 or 32, wherein 
the dictionary management information contains in- 
fomnation indicating if the user dictionary Is used in 
recognition of the speech data. 

34. The method according to any of claims 31 to 33, 
wherein the user dictionary is fomned by storing pro- 
nunciations and notations of the target recognition 
words in correspondence with each other. 

35. The method according to claim 32, wherein the user 
dictionary is formed by also storing at least one in- 
put form identifier and the target recognition words 
in correspondence with each other. 

36. The method according to any of claims 31 to 35, 
wherein the user dictionary is fonned by also storing 
at least one of recognition dictionary identifiers in- 
dicating recognition fields of the plurality of kinds of 
recognition dictionaries, and the target recognition 
words. 



37. The method according to any of claims 31 to 36, 
wherein the speech data is data obtained by encod- 
ing that speech data. 

5 38. A computer readable memory that stores a program 
code of control of a speech recognition server for 
recognizing speech input at a client, and sending a 
recognition result to the client, comprising: 

^0 a program code of a reception step of receiving, 

from the client, speech data, dictionary man- 
agement infonnation used to determine a rec- 
ognition field of a recognition dictionary used to 
recognize the speech data, and a user diction- 

^5 ary fomned by registering target recognition 

words designated by a user; 
a program code of a recognition dictionary hold- 
ing step of holding a plurality of kinds of recog- 
nition dictionaries prepared for respective rec- 

20 ognition fields; 

a program code of a detemnlnation step of de- 
tennining one or more recognition dictionary 
corresponding to the dictionary management 
information received from the client from the 

25 plurality of kinds of recognition dictionaries; and 

a program code of a recognition step of recog- 
nizing the speech data using at least the rec- 
ognition dictionary determined in the determi- 
nation step. 

30 

39. A computer readable memory that stores a program 
code of control of a speech recognition client for 
sending input speech to be recognized to a server, 
and receiving a recognition result of that speech, 
35 comprising: 

a program code of a speech input step of input- 
ting speech; 

a program code of a user dictionary holding 
40 step of holding a user dictionary fomned by reg- 

istering target recognition words designated by 
a user; and 

a program code of a transmission step of trans- 
mitting speech data input in the speech input 
45 step, dictionary management infomnation used 

to detemnine a recognition field of a recognition 
dictionary used to recognize the speech data, 
and the user dictionary to the server. 

50 40. A client-server speech recognition system for rec- 
ognizing speech input at a client by a server, 

the client comprising: 

55 a speech input unit inputs speech; 

a user dictionary holding a user dictionary 
fomned by registering target recognition 
words designated by a user; and 



40 



45 
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a transmitter transmits speech data input 
by said speecii input means, dictionary 
management infonnnation used to deter- 
mine a recognition field of a recognition dic- 
tionary used to recognize the speech data, 5 
and the user dictionary to the server, and 

the server comprising: 

a recognition dictionary holding a plurality 
of kinds of recognition dictionaries pre- 
pared for respective recognition fields; 
a detemriination unit detemnines one or 
more recognition dictionary corresponding 
to the dictionary management infomnation 
received from the client from the plurality 
of kinds of recognition dictionaries; and 
a recognition unit recognizes the speech 
data using at least the recognition diction- 
ary detemnined by said determination 
means. 

41. A speech recognition server for recognizing speech 
input at a client, and sending a recognition result to 
the client, comprising: 

a receiver receives, from the client, speech da- 
ta, dictionary management information used to 
determine a recognition field of a recognition 
dictionary used to recognize the speech data, 
and a user dictionary formed by registering tar- 
get recognition words designated by a user; 
a recognition dictionary holding a plurality of 
kinds of recognition dictionaries prepared for 
respective recognition fields; 
a determination unit determines one or more 
recognition dictionary corresponding to the dic- 
tionary management information received from 
the client from the plurality of kinds of recogni- 
tion dictionaries; and 

a recognition unit recognizes the speech data 
using at least the recognition dictionary deter- 
mined by said determination means. 

42. A speech recognition client for sending input 
speech to be recognized to a server, and receiving 
a recognition result of that speech, comprising: 

a speech input unit inputs speech; 
a user dictionary holding a user dictionary 
fonned by registering target recognition words 
designated by a user; and 
a transmitter transmits speech data input by 
said speech input means, dictionary manage- 
ment infomnation used to determine a recogni- 
tion field of a recognition dictionary used to rec- 
ognize the speech data, and the user dictionary 
to the server. 



43. Processor implementable instructions product for 
causing a programmable computer device to carry 
out the method of any of claims 28 to 37, when the 
instructions product is run on said programmable 
computer device. 
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